Cluster extraction and annotation strategies on tabular datasets with diverse feature types¶

Importing necessary libraries¶

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
from tensorflow import keras
import umap.umap_ as umap
%config InlineBackend.figure_format = 'svg'

Importing pre-processed data¶

In [2]:
np.random.seed(42)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
data=pd.read_csv('Preprocessed_DM_xx.csv')
In [3]:
np.random.seed(42)
data=data.sample(frac=1) #Shuffle the data set

Feature engineering¶

  • Creating new feature called hypertension
  • Filtering unnecessary details
In [4]:
np.random.seed(42)
HTN_indexes=data.loc[(data['Currently.taking.a.prescribed.medicine.to.lower.BP'] != 0) | (data['First.SYSTOLIC.reading'] >= 140) | (data['First.DIASTOLIC.reading'] >= 90) ].index.values
HTN_cols=np.zeros(data.shape[0])
HTN_cols[[HTN_indexes]]=1
data['HTN']=HTN_cols
data=data.drop(["First.SYSTOLIC.reading","First.DIASTOLIC.reading","Currently.taking.a.prescribed.medicine.to.lower.BP"], axis=1)
data=data.reset_index(drop=True)
data.columns
data=data.drop(["Hb_adjust_alt_smok","Second.SYSTOLIC.reading","Second.DIASTOLIC.reading","Third.SYSTOLIC.reading","Third.DIASTOLIC.reading","Hb_status","Glucose.level",'SBP_status'], axis=1)
data=data.loc[data['BMI'] != 99.99]
data=data.loc[data['Hemoglobin.level..g.dl...1.decimal.'] != 99.99]
data=data.loc[data['Currently.has.asthma'] != .5]
data=data.loc[data['Currently.has.thyroid.disorder'] != .5]
data=data.loc[data['Currently.has.heart.disease'] != .5]
data=data.loc[data['Currently.has.cancer'] != .5]
data=data.loc[data['DM_history'] == 1]
data=data.loc[data['Type.of.caste.or.tribe.of.the.household.head'] != 0]
data=data.loc[data['Time.to.get.to.water.source..minutes.'] != -1]
data=data.drop(["Unnamed: 0","DM_status","DM_history"], axis=1)
In [5]:
np.random.seed(42)
i=[x for x in range(10125)]

data.set_index(pd.Series(i), inplace=True) # Reset the index

Spliting features¶

Creating 2 new dataframes: "data_disease" with features related to disease and "data_others" with rest of the features

In [6]:
data_disease= data[['Currently.has.asthma',
       'Currently.has.thyroid.disorder', 'Currently.has.heart.disease',
       'Currently.has.cancer', 'Suffers.from.TB','HTN']]
In [7]:
data_others= data[['Drinks.alcohol', 'Smoking_stat','Has.refrigerator',
       'Has.bicycle', 'Has.motorcycle.scooter', 'Has.car.truck', 'Owns.livestock..herds.or.farm.animals','Frequency.takes.milk.or.curd',
       'Frequency.eats.pulses.or.beans',
       'Frequency.eats.dark.green.leafy.vegetable', 'Frequency.eats.fruits',
       'Frequency.eats.eggs', 'Frequency.eats.fish',
       'Frequency.eats.chicken.or.meat', 'Frequency.eats.fried.food',
       'Frequency.takes.aerated.drinks','Frequency.household.members.smoke.inside.the.house','Wealth.index',
       'Highest.educational.level', 'Current.age','BMI','Hemoglobin.level..g.dl...1.decimal.','Time.to.get.to.water.source..minutes.', 'Household.head.s.religion', 'Sex', 'Type.of.place.of.residence', 'Household.structure',
       'Type.of.caste.or.tribe.of.the.household.head','Type.of.cooking.fuel','Source.of.drinking.water']]

Function for dimension reduction using UMAP¶

In [8]:
def feature_clustering(UMAP_neb,min_dist_UMAP, metric, data, visual):
    import umap.umap_ as umap
    np.random.seed(42)
    data_embedded = umap.UMAP(n_neighbors=UMAP_neb, min_dist=min_dist_UMAP, n_components=2, metric=metric, random_state=42).fit_transform(data)
    data_embedded[:,0]=(data_embedded[:,0]- np.mean(data_embedded[:,0]))/np.std(data_embedded[:,0])
    data_embedded[:,1]=(data_embedded[:,1]- np.mean(data_embedded[:,1]))/np.std(data_embedded[:,1])
    result = pd.DataFrame(data = data_embedded , 
        columns = ['UMAP_0', 'UMAP_1'])
    if visual==1:
        sns.lmplot( x="UMAP_0", y="UMAP_1",data=result,fit_reg=False,legend=False,scatter_kws={"s": 3},palette=customPalette_set1) # specify the point size
        #plt.savefig('clusters_umap_all.png', dpi=700, bbox_inches='tight')
        plt.show()
    else:
        pass
    return result

Dividing features¶

  • ord_list=ordinal features
  • cont_list=continueous features
  • nom_list=nominal features
In [9]:
ord_list=['Drinks.alcohol', 'Smoking_stat','Has.refrigerator',
       'Has.bicycle', 'Has.motorcycle.scooter', 'Has.car.truck', 'Owns.livestock..herds.or.farm.animals','Frequency.takes.milk.or.curd',
       'Frequency.eats.pulses.or.beans',
       'Frequency.eats.dark.green.leafy.vegetable', 'Frequency.eats.fruits',
       'Frequency.eats.eggs', 'Frequency.eats.fish',
       'Frequency.eats.chicken.or.meat', 'Frequency.eats.fried.food',
       'Frequency.takes.aerated.drinks','Frequency.household.members.smoke.inside.the.house','Wealth.index',
       'Highest.educational.level' ]
cont_list=['Current.age','BMI','Hemoglobin.level..g.dl...1.decimal.','Time.to.get.to.water.source..minutes.']
nom_list=['Household.head.s.religion', 'Sex', 'Type.of.place.of.residence', 'Household.structure',
       'Type.of.caste.or.tribe.of.the.household.head','Type.of.cooking.fuel','Source.of.drinking.water']

Function for feature distributed clustering¶

Function parameters:¶

  • data=dataframe on which feature distributed clustering should be performed
  • cont_list=list of continueous features
  • nom_list=list of nominal features
  • ord_list=list of ordinal features
  • cont_metric=distance metric for continueous data
  • ord_metric=distance metric for ordinal data
  • nom_metric=distance metric for nominal data
  • drop_nominal=1(to drop nominal data) or 0(don't drop nominal data)
  • visual=1(to plot the data) or 0(don't plot the data)
In [10]:
def FDC(data,cont_list,nom_list,ord_list,cont_metric, ord_metric, nom_metric, drop_nominal, visual):
    np.random.seed(42)
    colors_set1 = ["lightcoral", "lightseagreen", "mediumorchid", "orange", "burlywood", "cornflowerblue", "plum", "yellowgreen"]
    customPalette_set1 = sns.set_palette(sns.color_palette(colors_set1))
    cont_df=data[cont_list]
    nom_df=data[nom_list]
    ord_df=data[ord_list]
    cont_emb=feature_clustering(30,0.1, cont_metric, cont_df, 0) #Reducing continueous features into 2dim
    ord_emb=feature_clustering(30,0.1, ord_metric, ord_df, 0) #Reducing ordinal features into 2dim
    nom_emb=feature_clustering(30,0.1, nom_metric, nom_df, 0) #Reducing nominal features into 2dim
    if drop_nominal==1:
        result_concat=pd.concat([ord_emb.drop(['UMAP_1'],axis=1), cont_emb.drop(['UMAP_1'],axis=1), nom_emb.drop(['UMAP_1'],axis=1)],axis=1) #concatinating all reduced dimensions to get 5D embedding(1D from nominal)
    else:
        result_concat=pd.concat([ord_emb, cont_emb, nom_emb],axis=1)
    data_embedded = umap.UMAP(n_neighbors=30, min_dist=0.001, n_components=2, metric='euclidean', random_state=42).fit_transform(result_concat) #reducing 5D embedding to 2D using UMAP
    result_reduced = pd.DataFrame(data = data_embedded , 
        columns = ['UMAP_0', 'UMAP_1'])
    
    if visual==1:
        sns.lmplot( x="UMAP_0", y="UMAP_1",data=result_reduced,fit_reg=False,legend=False,scatter_kws={"s": 3},palette=customPalette_set1) # specify the point size
        plt.show()
        #plt.savefig('clusters_umap_all.png', dpi=700, bbox_inches='tight')
    else:
        pass
    return result_concat, result_reduced #returns both 5D and 2D embedding
In [11]:
# applying Feature Distributed Clustering(FDC) on entire 10125 data with all features except disease features
entire_data_FDC_emb_three,entire_data_FDC_emb_two=FDC(data_others,cont_list,nom_list,ord_list,'euclidean','canberra','hamming',1,1)
2022-08-17T15:58:26.986777 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/

Agglomerative clustering on FDC embedding¶

In [12]:
def aglo_clustering(no_of_clusters, affinity, linkage, three_d_embedding, two_d_embedding,visual,pal):
    np.random.seed(42)
    from sklearn.cluster import AgglomerativeClustering
    ag_cluster = AgglomerativeClustering(n_clusters=no_of_clusters,affinity=affinity,linkage=linkage)
    clusters=ag_cluster.fit_predict(three_d_embedding)
    (values,counts) = np.unique(clusters,return_counts=True)
    two_d_embedding['Cluster'] = clusters
    
    if visual==1:
        sns.lmplot( x="UMAP_0", y="UMAP_1",
        data=two_d_embedding,
        fit_reg=False, 
        legend=True,
        hue='Cluster', # color by cluster
        scatter_kws={"s": 3},palette=pal) # specify the point size
        plt.savefig('agglo_ref_3dim.png', dpi=700, bbox_inches='tight')
        plt.show()
    else:
        pass
    return two_d_embedding.Cluster.to_list(),counts
In [13]:
#setting color palette for visualization of clusters
colors_set1 = ['lightcoral','cornflowerblue','orange','mediumorchid', 'lightseagreen','olive', 'chocolate','steelblue']
customPalette_set1 = sns.set_palette(sns.color_palette(colors_set1))


#Applying clustering algorithm on FDC embeddings from entire data
entire_data_cluster_list,entire_data_cluster_counts=aglo_clustering(4, "euclidean", "ward",entire_data_FDC_emb_three,entire_data_FDC_emb_two,1,customPalette_set1)
2022-08-17T15:58:31.587028 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
In [14]:
#Getting noise indices
non_noise_indices= np.where(np.array(entire_data_cluster_list)!=-1)

#Removing noise/outlires from FDC embeddings and from entire data
entire_data_FDC_emb_three= entire_data_FDC_emb_three.iloc[non_noise_indices]
entire_data_FDC_emb_two= entire_data_FDC_emb_two.iloc[non_noise_indices]
entire_data_cluster_list= np.array(entire_data_cluster_list)[non_noise_indices]
data_others= data_others.iloc[non_noise_indices]

#Creating new cloumn for storing cluster labels 
data_others['cluster_labels']= entire_data_cluster_list

#getting binary representation for cluster labels
data_others= pd.get_dummies(data=data_others, columns=['cluster_labels'])
In [15]:
#Getting column names of encoded cluster labels
cluster_column_names=data_others.columns[-len(np.unique(entire_data_cluster_list)):].to_list()

Dividing data set for experiments¶

In [16]:
#75%  of entire data for training
np.random.seed(42)
data=data_others.sample(frac=0.75) # Training data
In [17]:
#Another 25% of entire data for validation
np.random.seed(42)
data_val=data_others.drop(data.index) # Validation data

Dividing training data into 3 folds¶

In [18]:
#Dividing training data into three folds

np.random.seed(42)
df_1=data.sample(frac=0.33) #fold 1

df=data.drop(df_1.index)
df_2=df.sample(frac=0.51) #fold 2

df_3=df.drop(df_2.index) #fold 3
In [19]:
np.random.seed(42)
#Possible combinations of concating 2 folds for training and using remaining fold for testing
training_folds=[pd.concat([df_1,df_2],axis=0), pd.concat([df_2,df_3],axis=0), pd.concat([df_3,df_1],axis=0)]
testing_folds=[df_3,df_1,df_2]

Function for neural network¶

Function parameters:¶

  • n_features= dimension of input data
  • hidden_dim1= dimension of first hidden layer
  • hidden_dim2= dimension of second hidden layer
  • out_emb_size= dimension of output data
  • act1= first hidden layer activation function
  • act2= second hidden layer activation function
In [20]:
def neural_network(n_features,hidden_dim1,hidden_dim2,out_emb_size,act1,act2,loss):
    np.random.seed(42)
    tf.random.set_seed(42)
    model=keras.Sequential([
         keras.layers.Dense(hidden_dim1,input_dim=n_features,activation=act1),
         keras.layers.Dense(hidden_dim1,activation=act2),
         keras.layers.Dense(out_emb_size)])
    model.compile(optimizer="adam" ,
              loss=loss, 
              metrics=['mse'])
    return model    

Function for Cluster Incidence Matrix (CIM)¶

  • creating a matrix to evaluate the performance based on predicted cluster labels
In [21]:
def cluster_incidence_matrix_mod(cluster_list_new):
    np.random.seed(42)
    
    matrix=np.zeros((len(cluster_list_new),len(cluster_list_new)))
    for i in range(len(cluster_list_new)):
        for j in range(len(cluster_list_new)):
                if cluster_list_new[i]==cluster_list_new[j]:
                    matrix[i,j]=1 
                else:
                    pass
    
    return matrix 
In [22]:
#Function for decoding the encoded cluster labels
def label_decoder(label_dataframe):
    label_array=np.array(label_dataframe)
    decoded_labels=[]
    for i in label_array:
        max_val=np.argmax(i)
        decoded_labels.append(max_val)
    return decoded_labels
In [23]:
colnames=[]
for i in range(len(entire_data_FDC_emb_three.columns)):
    colnames.append('c'+str(i+1))
In [24]:
np.random.seed(42)
count=0
fold_readings=[]
while count<3:
    FDC_emb_three_train=entire_data_FDC_emb_three.loc[list(training_folds[count].index)] #3D FDC embedding of training folds from entire training data
    FDC_emb_two_train=entire_data_FDC_emb_two.loc[list(training_folds[count].index)] #2D embedding of training folds from entire training data
    FDC_emb_three_train.columns=colnames
    
    #Thirty dimensional data of training fold as features_matrix(X_train) 
    features_matrix=np.array(training_folds[count].drop(cluster_column_names, axis=1,inplace=False)) #X_train
    
    #three dimensional FDC embeddings of training fold as target_matrix(y_train)
    target_matrix=np.array(FDC_emb_three_train) #y_train
    
    #Train a neural network to get five dimensional embedding
    model_1=neural_network(len(features_matrix[0]),int(0.6*len(features_matrix[0])),int(0.36*len(features_matrix[0])),len(target_matrix[0]),"relu","sigmoid","mse")
    history=model_1.fit(features_matrix,target_matrix,epochs=30,batch_size=8)
    print('\n')
    print('Training history across epochs for fold ',count+1)
    plt.plot(history.history['mse'],'r')
    plt.ylabel('mse')
    plt.xlabel('epoch')
    plt.show()
    
    #Using same thirty dimensional features_matrix(X_train) from first neural network and encoded cluster labels of training fold as target_labels_matrix(y_train) 
    target_labels_matrix=np.array(training_folds[count].loc[:,cluster_column_names]) #y
    
    
    #Train a neural network to get encoded cluster labels
    model_2=neural_network(len(features_matrix[0]),int(0.6*len(features_matrix[0])),int(0.36*len(features_matrix[0])),len(target_labels_matrix[0]),"relu","softmax","mse")
    history=model_2.fit(features_matrix,target_labels_matrix,epochs=30,batch_size=8)
    print('\n')
    print('Training history across epochs for fold ',count+1)
    plt.plot(history.history['mse'],'r')
    plt.ylabel('mse')
    plt.xlabel('epoch')
    plt.show()
    
    #Decoding cluster labels of training fold
    decoded_target_labels_matrix=label_decoder(target_labels_matrix)

    #Actual encoded cluster labels of testing fold for metric calculation  
    ref_clusters=testing_folds[count].loc[:,cluster_column_names] 
    #Decoding encoded cluster labels of testing fold
    decoded_ref_clusters=label_decoder(ref_clusters)
    

    #predicting testing fold to get three dim embedding using trained model_1
    testing_data=testing_folds[count].drop(cluster_column_names, axis=1,inplace=False)
    predicted_3dim=pd.DataFrame(model_1.predict(testing_data), columns=colnames)
    
    #UMAP on predicted 3D embedding
    predicted_2dim=feature_clustering(30,0.01, "euclidean", predicted_3dim, 0)

    #predicting testing fold to get encoded cluster labels using trained model_2
    predicted_clusters=pd.DataFrame(model_2.predict(testing_data))
    
    #Decoding predicted encoded cluster labels
    decoded_predicted_clusters=label_decoder(predicted_clusters)
    
    
    #concatinating training and predicted 3D embedding
    concatenated_3dim=pd.concat([FDC_emb_three_train,predicted_3dim])
    
    #UMAP on concatinated embedding
    two_dim_viz=feature_clustering(30, 0.01, 'euclidean', concatenated_3dim, 0)
    
    #Concatinating decoded cluster labels of training fold and predicted testing fold
    concatenated_cluster_labels=np.concatenate([np.array(decoded_target_labels_matrix),np.array(decoded_predicted_clusters)+len(np.unique(decoded_target_labels_matrix))])
    
    two_dim_viz['Cluster']= concatenated_cluster_labels
    
    
    #Setting dark colors for training folds    
    darkerhues=['lightcoral','cornflowerblue','orange','mediumorchid', 'lightseagreen','olive', 'chocolate','steelblue']
    colors_set2=[]
    for i in range(len(np.unique(decoded_target_labels_matrix))):
        colors_set2.append(darkerhues[i])
    
    #Concatinating dark colors for training folds and corresponding light colors for testing fold
    colors_set2=colors_set2+["lightpink", 'skyblue', 'wheat', "plum","paleturquoise",  "lightgreen",  'burlywood','lightsteelblue']
    
    print('Vizualization for FDC for training fold (shown in dark hue) '+str(count+1) + 'and predicted clusters from neural network on testing fold (shown in corresponding light hues) '+str(count+1))
    
    #visualizing the clusters of both training and testing folds
    sns.lmplot( x="UMAP_0", y="UMAP_1", data=two_dim_viz, fit_reg=False, legend=False, hue='Cluster', scatter_kws={"s": 3},palette=sns.set_palette(sns.color_palette(colors_set2))) 
    plt.show()
    
    #Metric calculation

    CIM_predicted=cluster_incidence_matrix_mod(np.array(decoded_predicted_clusters))#Cluster incidence metric for predicted clusters
    CIM_reference=cluster_incidence_matrix_mod(np.array(decoded_ref_clusters))#Cluster incidence metric for reference clusters
    Product=np.dot(CIM_predicted,CIM_reference)
    cluster_incdences_in_data=np.sum(CIM_reference,axis=1)  
    mean_points_in_same_clusters=np.mean(np.diagonal(Product)/cluster_incdences_in_data)
    fold_readings.append(mean_points_in_same_clusters*100)
    
    print("Average percentage of patients belongs to the same cluster is: {}%".format(mean_points_in_same_clusters*100))
    print('\n')
    count+=1


print('\n')
print('\n')
Epoch 1/30
638/638 [==============================] - 1s 1ms/step - loss: 0.6788 - mse: 0.6788
Epoch 2/30
638/638 [==============================] - 1s 1ms/step - loss: 0.4396 - mse: 0.4396
Epoch 3/30
638/638 [==============================] - 1s 1ms/step - loss: 0.4002 - mse: 0.4002
Epoch 4/30
638/638 [==============================] - 1s 973us/step - loss: 0.3827 - mse: 0.3827
Epoch 5/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3710 - mse: 0.3710
Epoch 6/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3630 - mse: 0.3630
Epoch 7/30
638/638 [==============================] - 1s 988us/step - loss: 0.3539 - mse: 0.3539
Epoch 8/30
638/638 [==============================] - 1s 993us/step - loss: 0.3466 - mse: 0.3466
Epoch 9/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3398 - mse: 0.3398
Epoch 10/30
638/638 [==============================] - 1s 981us/step - loss: 0.3337 - mse: 0.3337
Epoch 11/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3292 - mse: 0.3292
Epoch 12/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3264 - mse: 0.3264
Epoch 13/30
638/638 [==============================] - 1s 981us/step - loss: 0.3211 - mse: 0.3211
Epoch 14/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3155 - mse: 0.3155
Epoch 15/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3090 - mse: 0.3090
Epoch 16/30
638/638 [==============================] - 1s 1ms/step - loss: 0.3016 - mse: 0.3016
Epoch 17/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2937 - mse: 0.2937
Epoch 18/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2854 - mse: 0.2854
Epoch 19/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2722 - mse: 0.2722
Epoch 20/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2612 - mse: 0.2612
Epoch 21/30
638/638 [==============================] - 1s 978us/step - loss: 0.2506 - mse: 0.2506
Epoch 22/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2406 - mse: 0.2406
Epoch 23/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2314 - mse: 0.2314
Epoch 24/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2236 - mse: 0.2236
Epoch 25/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2177 - mse: 0.2177
Epoch 26/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2111 - mse: 0.2111
Epoch 27/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2073 - mse: 0.2073
Epoch 28/30
638/638 [==============================] - 1s 983us/step - loss: 0.2033 - mse: 0.2033
Epoch 29/30
638/638 [==============================] - 1s 1ms/step - loss: 0.2000 - mse: 0.2000
Epoch 30/30
638/638 [==============================] - 1s 974us/step - loss: 0.1958 - mse: 0.1958


Training history across epochs for fold  1
2022-08-17T15:58:53.517242 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Epoch 1/30
638/638 [==============================] - 1s 997us/step - loss: 0.1348 - mse: 0.1348
Epoch 2/30
638/638 [==============================] - 1s 986us/step - loss: 0.0553 - mse: 0.0553
Epoch 3/30
638/638 [==============================] - 1s 996us/step - loss: 0.0393 - mse: 0.0393
Epoch 4/30
638/638 [==============================] - 1s 970us/step - loss: 0.0380 - mse: 0.0380
Epoch 5/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0373 - mse: 0.0373
Epoch 6/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0367 - mse: 0.0367
Epoch 7/30
638/638 [==============================] - 1s 993us/step - loss: 0.0361 - mse: 0.0361
Epoch 8/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0342 - mse: 0.0342
Epoch 9/30
638/638 [==============================] - 1s 992us/step - loss: 0.0318 - mse: 0.0318
Epoch 10/30
638/638 [==============================] - 1s 996us/step - loss: 0.0300 - mse: 0.0300
Epoch 11/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0286 - mse: 0.0286
Epoch 12/30
638/638 [==============================] - 1s 973us/step - loss: 0.0273 - mse: 0.0273
Epoch 13/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0268 - mse: 0.0268
Epoch 14/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0262 - mse: 0.0262
Epoch 15/30
638/638 [==============================] - 1s 999us/step - loss: 0.0253 - mse: 0.0253
Epoch 16/30
638/638 [==============================] - 1s 992us/step - loss: 0.0245 - mse: 0.0245
Epoch 17/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0242 - mse: 0.0242
Epoch 18/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0241 - mse: 0.0241
Epoch 19/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0235 - mse: 0.0235
Epoch 20/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0233 - mse: 0.0233
Epoch 21/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0229 - mse: 0.0229
Epoch 22/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0228 - mse: 0.0228
Epoch 23/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0221 - mse: 0.0221
Epoch 24/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0231 - mse: 0.0231
Epoch 25/30
638/638 [==============================] - 1s 993us/step - loss: 0.0224 - mse: 0.0224
Epoch 26/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0219 - mse: 0.0219
Epoch 27/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0217 - mse: 0.0217
Epoch 28/30
638/638 [==============================] - 1s 986us/step - loss: 0.0216 - mse: 0.0216
Epoch 29/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0212 - mse: 0.0212
Epoch 30/30
638/638 [==============================] - 1s 1ms/step - loss: 0.0212 - mse: 0.0212


Training history across epochs for fold  1
2022-08-17T15:59:13.568404 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
78/78 [==============================] - 0s 709us/step
78/78 [==============================] - 0s 686us/step
Vizualization for FDC for training fold (shown in dark hue) 1and predicted clusters from neural network on testing fold (shown in corresponding light hues) 1
2022-08-17T15:59:38.552602 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Average percentage of patients belongs to the same cluster is: 91.48227143075135%


Epoch 1/30
636/636 [==============================] - 1s 988us/step - loss: 0.6676 - mse: 0.6676
Epoch 2/30
636/636 [==============================] - 1s 1ms/step - loss: 0.4311 - mse: 0.4311
Epoch 3/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3923 - mse: 0.3923
Epoch 4/30
636/636 [==============================] - 1s 999us/step - loss: 0.3757 - mse: 0.3757
Epoch 5/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3644 - mse: 0.3644
Epoch 6/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3562 - mse: 0.3562
Epoch 7/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3507 - mse: 0.3507
Epoch 8/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3455 - mse: 0.3455
Epoch 9/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3397 - mse: 0.3397
Epoch 10/30
636/636 [==============================] - 1s 988us/step - loss: 0.3364 - mse: 0.3364
Epoch 11/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3332 - mse: 0.3332
Epoch 12/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3295 - mse: 0.3295
Epoch 13/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3266 - mse: 0.3266
Epoch 14/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3234 - mse: 0.3234
Epoch 15/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3208 - mse: 0.3208
Epoch 16/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3183 - mse: 0.3183
Epoch 17/30
636/636 [==============================] - 1s 1ms/step - loss: 0.3135 - mse: 0.3135
Epoch 18/30
636/636 [==============================] - 1s 947us/step - loss: 0.3095 - mse: 0.3095
Epoch 19/30
636/636 [==============================] - 1s 958us/step - loss: 0.3057 - mse: 0.3057
Epoch 20/30
636/636 [==============================] - 1s 993us/step - loss: 0.3014 - mse: 0.3014
Epoch 21/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2952 - mse: 0.2952
Epoch 22/30
636/636 [==============================] - 1s 987us/step - loss: 0.2908 - mse: 0.2908
Epoch 23/30
636/636 [==============================] - 1s 981us/step - loss: 0.2849 - mse: 0.2849
Epoch 24/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2726 - mse: 0.2726
Epoch 25/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2544 - mse: 0.2544
Epoch 26/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2415 - mse: 0.2415
Epoch 27/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2304 - mse: 0.2304
Epoch 28/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2205 - mse: 0.2205
Epoch 29/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2130 - mse: 0.2130
Epoch 30/30
636/636 [==============================] - 1s 1ms/step - loss: 0.2063 - mse: 0.2063


Training history across epochs for fold  2
2022-08-17T16:00:03.034894 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Epoch 1/30
636/636 [==============================] - 1s 1ms/step - loss: 0.1343 - mse: 0.1343
Epoch 2/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0608 - mse: 0.0608
Epoch 3/30
636/636 [==============================] - 1s 974us/step - loss: 0.0380 - mse: 0.0380
Epoch 4/30
636/636 [==============================] - 1s 972us/step - loss: 0.0364 - mse: 0.0364
Epoch 5/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0359 - mse: 0.0359
Epoch 6/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0354 - mse: 0.0354
Epoch 7/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0342 - mse: 0.0342
Epoch 8/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0326 - mse: 0.0326
Epoch 9/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0309 - mse: 0.0309
Epoch 10/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0295 - mse: 0.0295
Epoch 11/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0282 - mse: 0.0282
Epoch 12/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0269 - mse: 0.0269
Epoch 13/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0263 - mse: 0.0263
Epoch 14/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0257 - mse: 0.0257
Epoch 15/30
636/636 [==============================] - 1s 975us/step - loss: 0.0246 - mse: 0.0246
Epoch 16/30
636/636 [==============================] - 1s 983us/step - loss: 0.0243 - mse: 0.0243
Epoch 17/30
636/636 [==============================] - 1s 999us/step - loss: 0.0241 - mse: 0.0241
Epoch 18/30
636/636 [==============================] - 1s 981us/step - loss: 0.0232 - mse: 0.0232
Epoch 19/30
636/636 [==============================] - 1s 963us/step - loss: 0.0230 - mse: 0.0230
Epoch 20/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0225 - mse: 0.0225
Epoch 21/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0228 - mse: 0.0228
Epoch 22/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0221 - mse: 0.0221
Epoch 23/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0220 - mse: 0.0220
Epoch 24/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0218 - mse: 0.0218
Epoch 25/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0218 - mse: 0.0218
Epoch 26/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0211 - mse: 0.0211
Epoch 27/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0210 - mse: 0.0210
Epoch 28/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0209 - mse: 0.0209
Epoch 29/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0204 - mse: 0.0204
Epoch 30/30
636/636 [==============================] - 1s 1ms/step - loss: 0.0201 - mse: 0.0201


Training history across epochs for fold  2
2022-08-17T16:00:24.304798 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
79/79 [==============================] - 0s 703us/step
79/79 [==============================] - 0s 713us/step
Vizualization for FDC for training fold (shown in dark hue) 2and predicted clusters from neural network on testing fold (shown in corresponding light hues) 2
2022-08-17T16:00:48.153499 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Average percentage of patients belongs to the same cluster is: 92.48612119897409%


Epoch 1/30
625/625 [==============================] - 1s 992us/step - loss: 0.6540 - mse: 0.6540
Epoch 2/30
625/625 [==============================] - 1s 1ms/step - loss: 0.4236 - mse: 0.4236
Epoch 3/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3862 - mse: 0.3862
Epoch 4/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3700 - mse: 0.3700
Epoch 5/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3599 - mse: 0.3599
Epoch 6/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3518 - mse: 0.3518
Epoch 7/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3454 - mse: 0.3454
Epoch 8/30
625/625 [==============================] - 1s 980us/step - loss: 0.3364 - mse: 0.3364
Epoch 9/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3277 - mse: 0.3277
Epoch 10/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3172 - mse: 0.3172
Epoch 11/30
625/625 [==============================] - 1s 1ms/step - loss: 0.3052 - mse: 0.3052
Epoch 12/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2907 - mse: 0.2907
Epoch 13/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2773 - mse: 0.2773
Epoch 14/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2651 - mse: 0.2651
Epoch 15/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2568 - mse: 0.2568
Epoch 16/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2468 - mse: 0.2468
Epoch 17/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2395 - mse: 0.2395
Epoch 18/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2316 - mse: 0.2316
Epoch 19/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2270 - mse: 0.2270
Epoch 20/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2235 - mse: 0.2235
Epoch 21/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2182 - mse: 0.2182
Epoch 22/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2159 - mse: 0.2159
Epoch 23/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2125 - mse: 0.2125
Epoch 24/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2088 - mse: 0.2088
Epoch 25/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2098 - mse: 0.2098
Epoch 26/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2037 - mse: 0.2037
Epoch 27/30
625/625 [==============================] - 1s 1ms/step - loss: 0.2033 - mse: 0.2033
Epoch 28/30
625/625 [==============================] - 1s 1ms/step - loss: 0.1992 - mse: 0.1992
Epoch 29/30
625/625 [==============================] - 1s 1ms/step - loss: 0.1988 - mse: 0.1988
Epoch 30/30
625/625 [==============================] - 1s 1ms/step - loss: 0.1965 - mse: 0.1965


Training history across epochs for fold  3
2022-08-17T16:01:13.991936 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Epoch 1/30
625/625 [==============================] - 1s 1ms/step - loss: 0.1353 - mse: 0.1353
Epoch 2/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0558 - mse: 0.0558
Epoch 3/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0382 - mse: 0.0382
Epoch 4/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0365 - mse: 0.0365
Epoch 5/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0356 - mse: 0.0356
Epoch 6/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0352 - mse: 0.0352
Epoch 7/30
625/625 [==============================] - 1s 994us/step - loss: 0.0348 - mse: 0.0348
Epoch 8/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0344 - mse: 0.0344
Epoch 9/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0319 - mse: 0.0319
Epoch 10/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0295 - mse: 0.0295
Epoch 11/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0275 - mse: 0.0275
Epoch 12/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0263 - mse: 0.0263
Epoch 13/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0250 - mse: 0.0250
Epoch 14/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0240 - mse: 0.0240
Epoch 15/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0240 - mse: 0.0240
Epoch 16/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0231 - mse: 0.0231
Epoch 17/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0227 - mse: 0.0227
Epoch 18/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0217 - mse: 0.0217
Epoch 19/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0214 - mse: 0.0214
Epoch 20/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0210 - mse: 0.0210
Epoch 21/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0206 - mse: 0.0206
Epoch 22/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0210 - mse: 0.0210
Epoch 23/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0201 - mse: 0.0201
Epoch 24/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0195 - mse: 0.0195
Epoch 25/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0198 - mse: 0.0198
Epoch 26/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0196 - mse: 0.0196
Epoch 27/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0194 - mse: 0.0194
Epoch 28/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0188 - mse: 0.0188
Epoch 29/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0187 - mse: 0.0187
Epoch 30/30
625/625 [==============================] - 1s 1ms/step - loss: 0.0187 - mse: 0.0187


Training history across epochs for fold  3
2022-08-17T16:01:35.181181 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
82/82 [==============================] - 0s 714us/step
82/82 [==============================] - 0s 771us/step
Vizualization for FDC for training fold (shown in dark hue) 3and predicted clusters from neural network on testing fold (shown in corresponding light hues) 3
2022-08-17T16:01:59.775142 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Average percentage of patients belongs to the same cluster is: 90.48960959258635%






In [25]:
print('Average percentage of patients belonging to the same cluster over all three folds:', np.mean(np.array(fold_readings)))
Average percentage of patients belonging to the same cluster over all three folds: 91.48600074077059

Validation¶

In [26]:
np.random.seed(42)

FDC_emb_three_data=entire_data_FDC_emb_three.loc[list(data.index)] #5D FDC embedding of training fold from entire data
FDC_emb_two_data=entire_data_FDC_emb_two.loc[list(data.index)] #2D embedding of training fold from entire data
FDC_emb_three_data.columns=colnames

#Thirty dimensional data of training fold as features_matrix(X_train) 
features_matrix=np.array(data.drop(cluster_column_names, axis=1,inplace=False)) #X_train

#Five dimensional FDC embedding of training fold as target_matrix(y_train)
target_matrix=np.array(FDC_emb_three_data) #y_train

#Train a neural network to get five dimensional embedding
model_1=neural_network(len(features_matrix[0]),int(0.6*len(features_matrix[0])),int(0.36*len(features_matrix[0])),len(target_matrix[0]),"relu","sigmoid","mse")
history=model_1.fit(features_matrix,target_matrix,epochs=30,batch_size=8)
print('\n')
print('Training history across epochs for training data ')
plt.plot(history.history['mse'],'r')
plt.ylabel('mse')
plt.xlabel('epoch')
plt.show()

#Using same thirty dimensional features_matrix(X_train) from first neural network and encoded cluster labels of training fold as target_labels_matrix(y_train) 
target_labels_matrix=np.array(data.loc[:,cluster_column_names]) #y


#Train a neural network to get encoded cluster labels
model_2=neural_network(len(features_matrix[0]),int(0.6*len(features_matrix[0])),int(0.36*len(features_matrix[0])),len(target_labels_matrix[0]),"relu","softmax","mse")
history=model_2.fit(features_matrix,target_labels_matrix,epochs=30,batch_size=8)
print('\n')
print('Training history across epochs for training data ')
plt.plot(history.history['mse'],'r')
plt.ylabel('mse')
plt.xlabel('epoch')
plt.show()

#Decoding cluster labels of training fold
decoded_target_labels_matrix=label_decoder(target_labels_matrix)

#Actual encoded cluster labels of validation data for metric calculation  
ref_clusters=data_val.loc[:,cluster_column_names] 
#Decoding encoded cluster labels of validation data
decoded_ref_clusters=label_decoder(ref_clusters)


#predicting validation data to get five dim embedding using trained model_1
validation_data=data_val.drop(cluster_column_names, axis=1,inplace=False)
predicted_3dim=pd.DataFrame(model_1.predict(validation_data), columns=colnames)

#UMAP on predicted 5D embedding
predicted_2dim=feature_clustering(30,0.01, "euclidean", predicted_3dim, 0)

#predicting validation data to get encoded cluster labels using trained model_2
predicted_clusters=pd.DataFrame(model_2.predict(validation_data))

#Decoding predicted encoded cluster labels
decoded_predicted_clusters=label_decoder(predicted_clusters)


#concatinating training and predicted 5D embedding
concatenated_3dim=pd.concat([FDC_emb_three_data,predicted_3dim])

#UMAP on concatinated embedding
two_dim_viz=feature_clustering(30, 0.01, 'euclidean', concatenated_3dim, 0)

#Concatinating decoded cluster labels of training data and predicted validation data
concatenated_cluster_labels=np.concatenate([np.array(decoded_target_labels_matrix),np.array(decoded_predicted_clusters)+len(np.unique(decoded_target_labels_matrix))])

two_dim_viz['Cluster']= concatenated_cluster_labels



#Setting dark colors for training data    
darkerhues=['lightcoral','cornflowerblue','orange','mediumorchid', 'lightseagreen','olive', 'chocolate','steelblue']
colors_set2=[]
for i in range(len(np.unique(decoded_target_labels_matrix))):
    colors_set2.append(darkerhues[i])

#Concatinating dark colors for training data and corresponding light colors for validation data
colors_set2=colors_set2+["lightpink", 'skyblue', 'wheat', "plum","paleturquoise",  "lightgreen",  'burlywood','lightsteelblue']

print('Vizualization for FDC for training data (shown in dark hue) '+ 'and predicted clusters from neural network on validation data (shown in corresponding light hues) ')

#visualizing the clusters of both training and validation data
sns.lmplot( x="UMAP_0", y="UMAP_1", data=two_dim_viz, fit_reg=False, legend=False, hue='Cluster', scatter_kws={"s": 3},palette=sns.set_palette(sns.color_palette(colors_set2))) 
plt.show()

#Metric calculation

CIM_predicted=cluster_incidence_matrix_mod(np.array(decoded_predicted_clusters))#Cluster incidence metric for predicted clusters
CIM_reference=cluster_incidence_matrix_mod(np.array(decoded_ref_clusters))#Cluster incidence metric for reference clusters
Product=np.dot(CIM_predicted,CIM_reference)
cluster_incidences_in_data=np.sum(CIM_reference,axis=1)  
mean_points_in_same_clusters=np.mean(np.diagonal(Product)/cluster_incidences_in_data)
fold_readings.append(mean_points_in_same_clusters*100)

print("Average percentage of patients belongs to the same cluster is: {}%".format(mean_points_in_same_clusters*100))
print('\n')



print('\n')
print('\n')
Epoch 1/30
950/950 [==============================] - 2s 1ms/step - loss: 0.5901 - mse: 0.5901
Epoch 2/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3964 - mse: 0.3964
Epoch 3/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3718 - mse: 0.3718
Epoch 4/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3571 - mse: 0.3571
Epoch 5/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3479 - mse: 0.3479
Epoch 6/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3402 - mse: 0.3402
Epoch 7/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3287 - mse: 0.3287
Epoch 8/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3157 - mse: 0.3157
Epoch 9/30
950/950 [==============================] - 1s 1ms/step - loss: 0.3020 - mse: 0.3020
Epoch 10/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2823 - mse: 0.2823
Epoch 11/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2641 - mse: 0.2641
Epoch 12/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2486 - mse: 0.2486
Epoch 13/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2360 - mse: 0.2360
Epoch 14/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2247 - mse: 0.2247
Epoch 15/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2153 - mse: 0.2153
Epoch 16/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2073 - mse: 0.2073
Epoch 17/30
950/950 [==============================] - 1s 1ms/step - loss: 0.2017 - mse: 0.2017
Epoch 18/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1966 - mse: 0.1966
Epoch 19/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1934 - mse: 0.1934
Epoch 20/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1877 - mse: 0.1877
Epoch 21/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1841 - mse: 0.1841
Epoch 22/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1835 - mse: 0.1835
Epoch 23/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1778 - mse: 0.1778
Epoch 24/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1751 - mse: 0.1751
Epoch 25/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1753 - mse: 0.1753
Epoch 26/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1738 - mse: 0.1738
Epoch 27/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1686 - mse: 0.1686
Epoch 28/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1688 - mse: 0.1688
Epoch 29/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1645 - mse: 0.1645
Epoch 30/30
950/950 [==============================] - 1s 1ms/step - loss: 0.1640 - mse: 0.1640


Training history across epochs for training data 
2022-08-17T16:02:37.418691 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Epoch 1/30
950/950 [==============================] - 2s 1ms/step - loss: 0.1165 - mse: 0.1165
Epoch 2/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0414 - mse: 0.0414
Epoch 3/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0371 - mse: 0.0371
Epoch 4/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0363 - mse: 0.0363
Epoch 5/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0355 - mse: 0.0355
Epoch 6/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0349 - mse: 0.0349
Epoch 7/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0322 - mse: 0.0322
Epoch 8/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0292 - mse: 0.0292
Epoch 9/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0276 - mse: 0.0276
Epoch 10/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0263 - mse: 0.0263
Epoch 11/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0252 - mse: 0.0252
Epoch 12/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0240 - mse: 0.0240
Epoch 13/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0236 - mse: 0.0236
Epoch 14/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0237 - mse: 0.0237
Epoch 15/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0223 - mse: 0.0223
Epoch 16/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0222 - mse: 0.0222
Epoch 17/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0219 - mse: 0.0219
Epoch 18/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0211 - mse: 0.0211
Epoch 19/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0212 - mse: 0.0212
Epoch 20/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0207 - mse: 0.0207
Epoch 21/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0204 - mse: 0.0204
Epoch 22/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0203 - mse: 0.0203
Epoch 23/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0199 - mse: 0.0199
Epoch 24/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0196 - mse: 0.0196
Epoch 25/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0199 - mse: 0.0199
Epoch 26/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0194 - mse: 0.0194
Epoch 27/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0191 - mse: 0.0191
Epoch 28/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0196 - mse: 0.0196
Epoch 29/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0186 - mse: 0.0186
Epoch 30/30
950/950 [==============================] - 1s 1ms/step - loss: 0.0187 - mse: 0.0187


Training history across epochs for training data 
2022-08-17T16:03:09.343606 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
80/80 [==============================] - 0s 716us/step
80/80 [==============================] - 0s 825us/step
Vizualization for FDC for training data (shown in dark hue) and predicted clusters from neural network on validation data (shown in corresponding light hues) 
2022-08-17T16:03:28.549920 image/svg+xml Matplotlib v3.5.1, https://matplotlib.org/
Average percentage of patients belongs to the same cluster is: 91.26679427868682%






In [ ]: